AITopics | feature expectation

As humans come to rely on autonomous systems more, ensuring the transparency of such systems is important to their continued adoption. Explainable Artificial Intelligence (XAI) aims to reduce confusion and foster trust in systems by providing explanations of agent behavior. Partially observable Markov decision processes (POMDPs) provide a flexible framework capable of reasoning over transition and state uncertainty, while also being amenable to explanation. This work investigates the use of user-provided counterfactuals to generate contrastive explanations of POMDP policies. Feature expectations are used as a means of contrasting the performance of these policies. We demonstrate our approach in a Search and Rescue (SAR) setting. We analyze and discuss the associated challenges through two case studies.

explanation, feature expectation, optimal policy, (12 more...)

arXiv.org Artificial Intelligence

2403.1976

Country:

North America > United States > Colorado > Boulder County > Boulder (0.16)
North America > United States > New York > New York County > New York City (0.04)
North America > Mexico > Guanajuato (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Inverse Reinforcement Learning through Structured Classification Supélec - IMS-MaLIS Research Group Nancy, France

Neural Information Processing SystemsMar-14-2024, 08:02:21 GMT

This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclass classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.

algorithm, feature expectation, reward function, (15 more...)

Neural Information Processing Systems

Country: Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Active Third-Person Imitation Learning

Klein, Timo, Weinberger, Susanna, Singla, Adish, Tschiatschek, Sebastian

arXiv.org Machine LearningDec-26-2023

We consider the problem of third-person imitation learning with the additional challenge that the learner must select the perspective from which they observe the expert. In our setting, each perspective provides only limited information about the expert's behavior, and the learning agent must carefully select and combine information from different perspectives to achieve competitive performance. This setting is inspired by real-world imitation learning applications, e.g., in robotics, a robot might observe a human demonstrator via camera and receive information from different perspectives depending on the camera's position. We formalize the aforementioned active third-person imitation learning problem, theoretically analyze its characteristics, and propose a generative adversarial network-based active learning approach. Empirically, we demstrate that our proposed approach can effectively learn from expert demonstrations and explore the importance of different architectural choices for the learner's performance.

discriminator, information, learner, (14 more...)

arXiv.org Machine Learning

2312.16365

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.68)
Instructional Material > Course Syllabus & Notes (0.54)

Industry:

Leisure & Entertainment (0.68)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Joint Path planning and Power Allocation of a Cellular-Connected UAV using Apprenticeship Learning via Deep Inverse Reinforcement Learning

Shamsoshoara, Alireza, Lotfi, Fatemeh, Mousavi, Sajad, Afghah, Fatemeh, Guvenc, Ismail

arXiv.org Artificial IntelligenceJun-15-2023

This paper investigates an interference-aware joint path planning and power allocation mechanism for a cellular-connected unmanned aerial vehicle (UAV) in a sparse suburban environment. The UAV's goal is to fly from an initial point and reach a destination point by moving along the cells to guarantee the required quality of service (QoS). In particular, the UAV aims to maximize its uplink throughput and minimize the level of interference to the ground user equipment (UEs) connected to the neighbor cellular BSs, considering the shortest path and flight resource limitation. Expert knowledge is used to experience the scenario and define the desired behavior for the sake of the agent (i.e., UAV) training. To solve the problem, an apprenticeship learning method is utilized via inverse reinforcement learning (IRL) based on both Q-learning and deep reinforcement learning (DRL). The performance of this method is compared to learning from a demonstration technique called behavioral cloning (BC) using a supervised learning approach. Simulation and numerical results show that the proposed approach can achieve expert-level performance. We also demonstrate that, unlike the BC technique, the performance of our proposed approach does not degrade in unseen situations.

machine learning, reinforcement learning, uav, (17 more...)

arXiv.org Artificial Intelligence

2306.10071

Country:

North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Arizona > Coconino County > Flagstaff (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Telecommunications (1.00)
Energy (0.67)
Education (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.66)

Add feedback

Learning Safety Constraints from Demonstrations with Unknown Rewards

Lindner, David, Chen, Xin, Tschiatschek, Sebastian, Hofmann, Katja, Krause, Andreas

arXiv.org Artificial IntelligenceMay-25-2023

We propose Convex Constraint Learning for Reinforcement Learning (CoCoRL), a novel approach for inferring shared constraints in a Constrained Markov Decision Process (CMDP) from a set of safe demonstrations with possibly different reward functions. While previous work is limited to demonstrations with known rewards or fully known environment dynamics, CoCoRL can learn constraints from demonstrations with different unknown rewards without knowledge of the environment dynamics. CoCoRL constructs a convex safe set based on demonstrations, which provably guarantees safety even for potentially sub-optimal (but safe) demonstrations. For near-optimal demonstrations, CoCoRL converges to the true safe set with no policy regret. We evaluate CoCoRL in tabular environments and a continuous driving simulation with multiple constraints. CoCoRL learns constraints that lead to safe driving behavior and that can be transferred to different tasks and environments. In contrast, alternative methods based on Inverse Reinforcement Learning (IRL) often exhibit poor performance and learn unsafe policies.

demonstration, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2305.16147

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Austria > Vienna (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.66)

Industry:

Automobiles & Trucks (0.67)
Transportation > Ground > Road (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

IRL with Partial Observations using the Principle of Uncertain Maximum Entropy

Bogert, Kenneth, Gui, Yikang, Doshi, Prashant

arXiv.org Artificial IntelligenceAug-14-2022

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world applications that use noisy sensors computing the feature expectations may be challenging due to partial observation of the relevant model variables. For example, a robot performing apprenticeship learning may lose sight of the agent it is learning from due to environmental occlusion. We show that in generalizing the principle of maximum entropy to these types of scenarios we unavoidably introduce a dependency on the learned model to the empirical feature expectations. We introduce the principle of uncertain maximum entropy and present an expectation-maximization based solution generalized from the principle of latent maximum entropy. Finally, we experimentally demonstrate the improved robustness to noisy data offered by our technique in a maximum causal entropy inverse reinforcement learning domain.

algorithm, entropy, maximum entropy, (16 more...)

arXiv.org Artificial Intelligence

2208.06988

Country:

North America > United States > North Carolina > Buncombe County > Asheville (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

A Hierarchical Bayesian model for Inverse RL in Partially-Controlled Environments

Bogert, Kenneth, Doshi, Prashant

arXiv.org Artificial IntelligenceJul-12-2021

Robots learning from observations in the real world using inverse reinforcement learning (IRL) may encounter objects or agents in the environment, other than the expert, that cause nuisance observations during the demonstration. These confounding elements are typically removed in fully-controlled environments such as virtual simulations or lab settings. When complete removal is impossible the nuisance observations must be filtered out. However, identifying the source of observations when large amounts of observations are made is difficult. To address this, we present a hierarchical Bayesian model that incorporates both the expert's and the confounding elements' observations thereby explicitly modeling the diverse observations a robot may receive. We extend an existing IRL algorithm originally designed to work under partial occlusion of the expert to consider the diverse observations. In a simulated robotic sorting domain containing both occlusion and confounding elements, we demonstrate the model's effectiveness. In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.

artificial intelligence, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2107.05818

Country:

North America > United States > North Carolina > Buncombe County > Asheville (0.14)
North America > United States > Georgia > Clarke County > Athens (0.14)

Genre: Research Report (0.82)

Add feedback

Filters

Collaborating Authors

feature expectation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints

9308b0d6e5898366a4a986bc33f3d3e7-Paper.pdf

Leveraging Counterfactual Paths for Contrastive Explanations of POMDP Policies

Inverse Reinforcement Learning through Structured Classification Supélec - IMS-MaLIS Research Group Nancy, France

Active Third-Person Imitation Learning

Joint Path planning and Power Allocation of a Cellular-Connected UAV using Apprenticeship Learning via Deep Inverse Reinforcement Learning

Learning Safety Constraints from Demonstrations with Unknown Rewards

IRL with Partial Observations using the Principle of Uncertain Maximum Entropy

A Hierarchical Bayesian model for Inverse RL in Partially-Controlled Environments